Skip to content

feat: implement remote gguf inspection on hugging face#39

Merged
pipe1os merged 6 commits into
mainfrom
advisor/004-remote-gguf-inspection
Jun 27, 2026
Merged

feat: implement remote gguf inspection on hugging face#39
pipe1os merged 6 commits into
mainfrom
advisor/004-remote-gguf-inspection

Conversation

@pipe1os

@pipe1os pipe1os commented Jun 27, 2026

Copy link
Copy Markdown
Owner

Summary

This PR implements remote GGUF inspection for Hugging Face repositories, enabling the CLI to parse and calculate VRAM footprints for remote GGUF weights without downloading the full files.

Motivation & Context

Prior to this change, remote inspection on Hugging Face only supported SafeTensors format. Attempting to parse GGUF repositories resulted in a crash. This change introduces support for single and multi-quantization GGUF repositories, showing a comparison table of available files when no specific quantization is targeted.

Type of Change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

The changes were tested using pytest:

  • Unit tests added to tests/test_parsers.py to cover remote GGUF single, group, and targeted parsing with mocked HTTP responses.

  • Unit tests added to tests/test_cli.py to cover CLI routing and print_model_info group rendering.

  • Manual testing against live HF GGUF repository (bartowski/Meta-Llama-3-8B-Instruct-GGUF).

  • Unit tests

  • Integration tests

  • Manual testing

Screenshots (if appropriate)

N/A

Checklist

  • My code follows the code style of this project.
  • My commit messages follow the Conventional Commits format, are lowercase, imperative, and specific.
  • I have updated the documentation accordingly (if applicable).
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@pipe1os, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 13 minutes and 41 seconds. Learn how PR review limits work.

To continue reviewing without waiting, enable usage-based billing in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 8befa85b-8ade-48bb-a859-8228abd5c930

📥 Commits

Reviewing files that changed from the base of the PR and between f9f730c and 357ee16.

📒 Files selected for processing (7)
  • README.md
  • src/modelinfo/cli.py
  • src/modelinfo/parsers/gguf.py
  • src/modelinfo/parsers/huggingface.py
  • src/modelinfo/ui.py
  • tests/test_cli.py
  • tests/test_parsers.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch advisor/004-remote-gguf-inspection

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@codacy-production

codacy-production Bot commented Jun 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 72 complexity · 8 duplication

Metric Results
Complexity 72
Duplication 8

View in Codacy

AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.

Run reviewer

TIP This summary will be updated as you push new changes.

@codacy-production codacy-production Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR successfully implements remote GGUF inspection for Hugging Face repositories using HTTP Range requests, which is a significant functional addition. However, the PR is currently not up to standards according to Codacy analysis.

There are two primary areas of concern: high complexity and potential stability issues. Specifically, fetch_huggingface_repo has seen a massive complexity increase (+37) that makes it harder to maintain. Furthermore, the RemoteFileStream implementation lacks buffer management, which could lead to high memory usage when encountering malformed headers. There are also implementation gaps in the UI where user-provided GPU utilization parameters are ignored in the comparison table logic. While error handling for gated models and missing repositories was implemented, it lacks automated test verification.

About this PR

  • The PR adds handling for 401 Unauthorized (gated models) and 404 Not Found scenarios, but these paths are not covered by the current test suite. Please add unit tests to verify the behavior of the remote parser when these HTTP errors occur.
1 comment outside of the diff
src/modelinfo/cli.py

line 133 🟡 MEDIUM RISK
This function is too complex (16). It handles path resolution, remote repo detection, and multiple file format logic in one block. Refactoring this into a dispatcher pattern would improve maintainability.

Test suggestions

  • Remote GGUF single file header parsing via range-based streaming
  • Remote GGUF multi-file (group) repository detection and metadata aggregation
  • Targeting a specific quantization file within a remote Hugging Face repository
  • CLI UI rendering of the GGUF quantization comparison table with VRAM estimates
  • Handling of 401 Unauthorized (Gated models) and 404 Not Found (Missing repos) for GGUF paths
Prompt proposal for missing tests
Consider implementing these tests if applicable:
1. Handling of 401 Unauthorized (Gated models) and 404 Not Found (Missing repos) for GGUF paths
Low confidence findings
  • The UI uses a hardcoded 600MB overhead fallback for GGUF groups (line 82). This fixed value may lead to inaccurate 'Fits' calculations for models that deviate significantly from standard overhead patterns.

TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback

Comment thread src/modelinfo/ui.py Outdated
row_data = [filename, file_size_str, kv_cache_str, total_vram_str]
if show_fits:
utilization = total_vram_bytes / (max_vram_gb * 1024**3) if max_vram_gb > 0 else 2.0
if utilization <= 0.90:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

The 'Fits' column logic should use the 'gpu_util' parameter instead of a hardcoded 0.90 threshold to respect the user's CLI configuration.

pass


def fetch_huggingface_repo(repo_id: str, fetch_tensors: bool = False, timeout: float = 10.0) -> Tuple[Dict[str, Any], Dict[str, Any] | None, str, float]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Suggestion: The function fetch_huggingface_repo has grown to ~150 lines and contains several branching paths for different repository types. Refactor this by splitting the logic into smaller private helper methods (e.g., _handle_gguf_repo, _handle_safetensors_repo) for each model format detection path.

self.url = url
self.chunk_size = chunk_size
self.timeout = timeout
self.buffer = b""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM RISK

Suggestion: The buffer in RemoteFileStream accumulates all read data and is never cleared. Since this stream is primarily used for header parsing, consider implementing a sliding window or limiting the maximum buffer size to prevent high memory usage on malformed files.

Comment thread tests/test_cli.py Outdated
assert len(info["tensors"]["__metadata__"]["gguf_variants"]) == 2


def test_print_model_info_gguf_group(capsys):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

This test method exceeds the length limit (63 lines). Move the large tensors and footprint dictionary definitions into reusable pytest fixtures or a local helper function.

Comment on lines +148 to +149
except Exception:
raise

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Nitpick: This exception handler is redundant and can be removed as it only re-raises the exception.

Comment thread src/modelinfo/parsers/huggingface.py Outdated

url = f"{_get_hf_endpoint()}/{real_repo_id}/resolve/main/{target_filename}"
stream = RemoteFileStream(url, timeout=timeout)
from modelinfo.parsers.gguf import parse_gguf_header

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚪ LOW RISK

Nitpick: The import of parse_gguf_header is duplicated multiple times. Move it to the top of the fetch_huggingface_repo function to keep the logic clean.

@pipe1os pipe1os changed the title implement remote gguf inspection on hugging face feat: implement remote gguf inspection on hugging face Jun 27, 2026
@pipe1os pipe1os self-assigned this Jun 27, 2026
@pipe1os pipe1os merged commit 0e82634 into main Jun 27, 2026
11 checks passed
@pipe1os pipe1os deleted the advisor/004-remote-gguf-inspection branch June 27, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant